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1, INTRODUCTION 

The process of acquiring information about an object on the earth using satellites without making 
any physical contact is called remote sensing [1]. The segmentation of objects on the earth by using 
electromagnetic radiations reflected or emitted by the surface is the main goal of remote sensing technology. 
New opportunities to use remote sensing data have arisen, with the increase of spatial and spectral resolution 
of recently launched satellites. Image segmentation is a key step in remote sensing applications [2]. In remote 
sensing, sensors are available that can generate hyperspectral data, involving many narrow bands in which 
each pixel has a continuous reflectance spectrum. Unsupervised image segmentation is an important research 
topic in hyperspectral imaging, with the aim to develop efficient algorithms that provide high segmentation 
accuracy [3]. 

This paper presents a framework for hyperspectral image segmentation using a clustering algorithm. 
The framework consists of three stages in segmenting a hyperspectral data set. In the first stage, 
dimensionality reduction algorithm is used to remove the bands that convey less information or redundant 
data. The data acquired from hyperspectral sensor usually contains hundreds of bands. This large data, fine 
details, redundancy and hyperspectrality differentiates the features of hyperspectral data and multispectral 
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data. 

Due to this intrinsic dimensionality of data and spectral correlation between bands, dimensionality reduction 
is important for hyperspectral image segmentation. The dimensionality reduction step decreases many 
requirements for processing the hyperspectral data set such as storage space, computational load, 
communication bandwidth etc, thus increasing the efficiency of segmentation algorithm. In this paper a new 
method of dimensionality reduction using subset selection method is designed to select only informative 
bands removing redundancy. In the second stage, the informative bands which are selected in the first stage 
are merged into a single image using hierarchical fusion technique. The main goal of image fusion is to 
create a single image combining all the features in the selected image bands. After getting a single image, in 
the third stage, the image is segmented using hierarchical clustering algorithm. The flow diagram of proposed 
framework is shown in Figure 1. Section 2 presents dimensionality reduction using subset selection method, 
Section 3 presents hierarchical image fusion, Section 4 presents Hierarchical clustering algorithm, Section 5 
presents Experimental results and Section 6 presents conclusions. 
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Figure 1. Framework for hyperspectral image segmentation 


2. RESEARCH METHOD 

Hyperspectral data set consists of stack of images which are strongly correlated means that there 1s 
huge amount of redundant information and that data need to be removed before segmentation. 
The dimensionality reduction step decreases many requirements for processing the hyperspectral data set, 
thus increasing the efficiency of segmentation algorithm [4]. The dimensionality reduction can be done using 
transformation based methods or selection based methods [5]. In transform based methods matrix 
transformations are used to project the data into lower dimension space which changes the physical meaning 
of spectral data. The selection based methods directly measures the information content in each individual 
band. The band selection is done by choosing the bands with higher information content. Selection based 
methods are better than transform based methods, as did not change the meaning of original dataset [6]. 
In transform based methods, as the original data is transformed, some useful information required for 
segmentation may be distorted, changing the physical meaning of data. This paper presents a new 
methodology of selection based dimensionality reduction of hyperspectral data. 

Given hyperspectral data set with d bands. From these d bands, proposed methodology finds k bands 
that give us the most information for segmentation, discarding (d-k) bands. These set of k bands contains the 
least number of dimensions that most contribute to accuracy of segmentation. Objective of this work is to 
create subsets of bands based on similarity criteria. Most similar bands are grouped into one subset. All bands 
in the same subset are similar with respect to metric M and the bands in different subsets are dissimilar with 
the same metric M. In this way k subsets are created, and we select one representative band from each subset. 
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These k bands from k subsets are used for further processing in the framework for hyperspectral image 
segmentation. The remaining (d-k) are discarded, thus achieving dimensionality reduction. In-order to create 
subsets, three parameters are required. A metric ‘M’, a reference band ‘I,.¢’, and Threshold ‘T’. Metric ‘M’: 
A metric M is a measure that directly shows how similar two images are. The metrics used in creation of 
subsets are Average Pixel Intensity [API], Histogram Similarity [HS], Mutual Information [MI] and 
Correlation Similarity [CS] [7]. 

The Average Pixel Intensity [API] of an image is calculated according to the given equation: 





1 M WN 
API, = I(x, y) 
M*N dy (1) 


M = L ef =A; 








Where M is the similarity metric and /,,, is the first image of a group. 
The Mutual Information [MI] between two images A=/,,, and B=/; can be given by equation: 


MI = py ise 
NOS ae) 


(2) 


The Histogram Similarity [HS] between two images A= /,,, and B=/; can be given by equation: 


HS = Ds [Tee (KL (K) (3) 


Where K is the number of bins chosen for the histogram computation. If the value of HS is closer to zero, 
indicating the two histograms are highly disjointed. 
The Correlation Similarity [CS] between two images is defined as 
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Reference band ‘Iref ‘: It always denote the first band of each subset. It is compared with the bands 
in the subset keeping similar images in the subset and discarding dissimilar images. 

Threshold T: This is threshold that shows the similarity of images in a subset. Higher the threshold, 
smaller bands in each subset, increasing the total number of subsets. 

Start with subseti,i=1) containing the original data set Ij (j=1,....,d). Assume Iref =I1 first band in 
the data set. Add each band into subseti, if the similarity between reference and selected band is less than 
threshold T. Otherwise move to subseti+1. Start with subseti= ® (empty set) and bands are added to the 
subseti if M(Iref, Ij) < T. The subseti+1 is created according to the following formula 


< T, assign Ij to subseti 
M(ref, Ij)= (5) 
>T, assign Ij to subseti+1, and first band in subseti+1 = Iref for subseti+1. 


Where M is the similarity metric between two bands and j ranges from 2 to d. The same process is repeated 
until k subsets are created. From these k subsets, k bands are selected one from each subset. 


3. HIERARCHICAL IMAGE FUSION TECHNIQUE 

In hierarchical image fusion technique [8], the entire data set is partitioned into P subsets of 
hyperspectral, where P is given by P = =, K number of bands in data set and M bands in each subset. 
First image fusion is carried out independently on these P subsets, to form P fused images. These P images 
are used as input for second stage fusion again by dividing into subsets. This procedure is repeated in a 
hierarchical manner to generate the final result of fusion in a few stages. The flow diagram of hierarchical 
image fusion is shown in Figure 2. The fused image F at any stage is a linear combination of input images as 
shown below: 
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F(x, y)= ow, (x,y), y) 


k=] 
and (6) 


> w(x, y)=L V(x y) 


where wy, (Xx, y) is the normalized weight for the pixel at location (x, y), F(x,y) is the fused image [9, 10]. 


4. 





Figure 2. Hierarchical image fusion 


HIERARCHICAL CLUSTERING TECHNIQUE 
Hierarchical clustering technique segments the data in the form of multilevel hierarchy, where the 


clusters at one level are combined with the clusters at the next level forming hierarchical cluster tree or 
dendrogram. The scaling of the level of hierarchy depends on the application on which hierarchical clustering 
algorithm is implemented and the dendrogram shows the cluster tree [11]. The basic steps in the hierarchical 
clustering algorithm on hyperspectral image obtained after fusing with implementation in Matlab is 
shown: 


a. 
b. 


Compute the similarity or dissimilarity between pixels in the image. 

This can be done using ‘pdist’ function in Matlab with different distance metrics such as ‘euclidean’, 
‘seuclidean’,‘cityblock’, ‘minkowsk1’, ‘cosine’, etc., 

After finding the similarity distances group the pixels, into binary hierarchical cluster tree. 

This can be done using ‘linkage’ function in Matlab with a method such aa ‘average’, ‘centroid’, 
‘complete’, ‘median’, ‘single’ and ‘weighted’ for computing distance between clusters. 

The linkage function uses the distance information generated in (i) to determine the proximity of pixels to 
each other. As pixels are paired into binary clusters, the newly formed clusters are grouped into larger 
clusters until a hierarchical tree is formed. 

After construction of cluster tree, next step is to determine where to cut the hierarchical tree into clusters. 
This can be done by using ‘cluster’ function in Matlab. The cluster function can create these clusters by 
detecting natural groupings in the hierarchical tree or by cutting off the hierarchical tree at an arbitrary 
point. This function has a parameter ‘maxclust’ to create specific number of clusters. 

The hierarchical cluster tree generated by above three functions can be visualized graphically by 
dendrogram plots which can be done by using the function ‘dendrogram’. 
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5. EXPERIMENTAL RESULTS 

The proposed methodology is tested on Indian Pines hyperspectral image data set collected from 
[12] containing 220 spectral bands. The dimensionality reduction is done using proposed subset methodology 
with different similarity metrics and 80 bands selected from 220 bands. After dimensionality reduction, 
hierarchical image fusion is carried out to create a single image. This image is segmented using Hierarchical 
Clustering Algorithm with maximum 16 clusters. In the tree generated by hierarchical clustering technique, 
any two pixels in the original image are linked together at some level and the height of the link denotes the 
distance between two clusters that contain those pixels. This height is known as cophenetic distance. This 
distance will measure the correlation between pdist and linkage functions, known as cophenetic correlation 
coefficient. The closer the value of the cophenetic correlation coefficient is to 1, the more accurately the 
clustering solution reflects your data [13]. Table 1 shows the cophenetic correlation coefficient measures 
with different metrics in pdist and linkage functions used in hierarchical clustering technique on 
hyperspectral image. Figure 3 shows the code snippet of hierarchical clustering. Table 2 shows the 
classification accuracies obtained by hierarchical clustering with different similarity metrics used in 
dimensionality reduction compared with ground truth information available in [14, 15]. The qualitative 
analysis of the proposed method on Indian Pines hyperspectral data set is shown in Figure 4. 


Table 1. Cophenetic correlation measures 


Pdist Linkage —_ Cophenetic correlation measure 
Euclidean Single 0.8742 
Cityblock Average 0.9197 
Squaredeuclidean Centroid 0.9015 
Cosine Median 0.8698 


Table 2. Classification accuracies of different similarity metrics used in dimensionality reduction step 
Similarity metric in dimensionality reduction algorithm —_ Classification accuracy 


API 89.6% 
MI 90.1% 
HS 92.3% 
CS 94.9% 


Y=pdist (double (14), 'squaredeuclidean'); 
squareform(Y); 

2=linkage (Y, 'centroid'); 

Hendrogram (Z) ; 

c = cophenet( Z, Y); 

T = cluster( Z,'maxclust', 2); 


Figure 3. Code Snippet of Hierarchical clustering 





(a) (b) (c) 


Figure 4. Segmentation of hyperspectral image using FCM: (a) Original image band 100 (Indian 
pines dataset), (b) Fused Image, (c) Segmented Image 
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6. CONCLUSIONS 

In this paper a framework for hyperspectral image segmentation is presented. The framework 1s carried 
out in three stages. First stage contains dimensionality reduction method using subset selection method to 
select informative bands leaving the bands that convey less descriptive information, second stage contains 
hierarchical image fusion to generate a single informative band and in the third stage, segmentation using 
Hierarchical clustering algorithm. Existing methods for hyperspectral data sets is done by selecting limited 
number of bands normally less than seven. The accuracy of any segmentation algorithm decreases if the 
number of spectral bands increases. The framework presented in this paper provides a methodology for 
segmenting the hyperspectral data set by incorporating all the information existing in the original bands 
rather than selecting some spectral bands. The methodology presented in this paper shows the performance of 
hierarchical clustering algorithm by using different similarity metrics in dimensionality reduction algorithm. 
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